Practical linear-space Approximate Near Neighbors in high dimension

نویسندگان

  • Georgia Avarikioti
  • Ioannis Z. Emiris
  • Ioannis Psarros
  • Georgios Samaras
چکیده

The c-approximate Near Neighbor problem in high dimensional spaces has been mainly addressed by Locality Sensitive Hashing (LSH), which offers polynomial dependence on the dimension, query time sublinear in the size of the dataset, and subquadratic space requirement. For practical applications, linear space is typically imperative. Most previous work in the linear space regime focuses on the case that c exceeds 1 by a constant term. In a recently accepted paper, optimal bounds have been achieved for any c > 1 [ALRW17]. Towards practicality, we present a new and simple data structure using linear space and sublinear query time for any c > 1 including c → 1. Given an LSH family of functions for some metric space, we randomly project points to the Hamming cube of dimension logn, where n is the number of input points. The projected space contains strings which serve as keys for buckets containing the input points. The query algorithm simply projects the query point, then examines points which are assigned to the same or nearby vertices on the Hamming cube. We analyze in detail the query time for some standard LSH families. To illustrate our claim of practicality, we offer an open-source implementation in C++, and report on several experiments in dimension up to 1000 and n up to 10. Our algorithm is one to two orders of magnitude faster than brute force search. Experiments confirm the sublinear dependence on n and the linear dependence on the dimension. We have compared against stateof-the-art LSH-based library FALCONN: our search is somewhat slower, but memory usage and preprocessing time are significantly smaller.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Low-Quality Dimension Reduction and High-Dimensional Approximate Nearest Neighbor

The approximate nearest neighbor problem ( -ANN) in Euclidean settings is a fundamental question, which has been addressed by two main approaches: Data-dependent space partitioning techniques perform well when the dimension is relatively low, but are affected by the curse of dimensionality. On the other hand, locality sensitive hashing has polynomial dependence in the dimension, sublinear query...

متن کامل

Quantitative Analysis of Nearest-Neighbors Search in High-Dimensional Sampling-Based Motion Planning

We quantitatively analyze the performance of exact and approximate nearest-neighbors algorithms on increasingly high-dimensional problems in the context of sampling-based motion planning. We study the impact of the dimension, number of samples, distance metrics, and sampling schemes on the efficiency and accuracy of nearest-neighbors algorithms. Efficiency measures computation time and accuracy...

متن کامل

Graph-based time-space trade-offs for approximate near neighbors

We take a first step towards a rigorous asymptotic analysis of graph-based approaches for finding (approximate) nearest neighbors in high-dimensional spaces, by analyzing the complexity of (randomized) greedy walks on the approximate near neighbor graph. For random data sets of size n = 2o(d) on the d-dimensional Euclidean unit sphere, using near neighbor graphs we can provably solve the approx...

متن کامل

Local Doubling Dimension of Point Sets

We introduce the notion of t-restricted doubling dimension of a point set in Euclidean space as the local intrinsic dimension up to scale t. In many applications information is only relevant for a fixed range of scales. We present an algorithm to construct a hierarchical net-tree up to scale t which we denote as the net-forest. We present a method based on Locality Sensitive Hashing to compute ...

متن کامل

Approximate line nearest neighbor in high dimensions

We consider the problem of approximate nearest neighbors in high dimensions, when the queries are lines. In this problem, given n points in R, we want to construct a data structure to support efficiently the following queries: given a line L, report the point p closest to L. This problem generalizes the more familiar nearest neighbor problem. From a practical perspective, lines, and low-dimensi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • CoRR

دوره abs/1612.07405  شماره 

صفحات  -

تاریخ انتشار 2016